Variant Discovery    ◾    127

genotype is called and assigned to the sample, and positions with putative variants (substi-

tutions or InDels) are written into the VCF file.

In the following, we will perform variant calling with both FreeBayes and GATK, which

are examples of haplotype-based variant callers.

4.2.2.1  FreeBayes Variant Calling Pipeline

FreeBayes [7] is a haplotype-based and Bayesian variant detector that is used to detect small

variants such as single and multiple-nucleotide polymorphisms and InDels. On Linux, we

can install FreeBayes as follows:

sudo apt update

sudo apt install freebayes

Use the following command to read more about FreeBayes usage and options:

freebayes –help

For variant calling, we will use the following form:

freebayes \

-f ../ref/GCF_009858895.2_ASM985889v3_genomic.fna \

-C 5 \

-L bam_list.txt \

-v ../variants/sarscov2.vcf

The “-f” option specifies the reference file, “-C” specifies the minimum number of observa-

tions supporting an alternate allele within a single individual in order to evaluate the posi-

tion, “-L” will pass the name of the text file that contains the names of the BAM files (each

file name in a line), “-v” will pass the VCF file name.

We will use FreeBayes in the above example to identify variants in the SARS-CoV-2

samples. We will follow the same steps we did for “bcftools” above. First, we will create a

project directory and store the run IDs in a file “ids.txt” as above. Then, we will save the

following script in a file “pipeline_freebayes.sh” and execute it as “bash pipeline_freebayes.

sh”:

#!/bin/bash

#Sars-Cov2 variant calling

#-------------------------

#1- download fastq files from the NCBI SRA database

mkdir fastq

while read f;

do

fasterq-dump --progress --outdir fastq “$f”

done < ids.txt